Smoothing Methods and Cross-Language Document Re-ranking
نویسندگان
چکیده
This paper presents a report on our participation in the CLEF 2009 monolingual and bilingual ad hoc TEL@CLEF task involving three different languages: English, French and German. Language modeling was adopted as the underlying information retrieval model. While the data collection is extremely sparse, smoothing is particularly important when estimating a language model. The main purpose of the monolingual tasks is to compare different smoothing strategies and investigate the effectiveness of each alternative. This retrieval model was then used alongside a document re-ranking method based on Latent Dirichlet Allocation (LDA) which exploits the implicit structure of the documents with respect to original queries for the monolingual and bilingual tasks. Experimental results demonstrated that three smoothing strategies behave differently across testing languages while the LDA-based document re-ranking method should be considered further in order to bring significant improvement over the baseline language modeling systems in the cross-language setting.
منابع مشابه
Language Modeling and Document Re-Ranking: Trinity Experiments at TEL@CLEF-2009
This paper presents a report on our participation in the CLEF-2009 monolingual and bilingual ad hoc TEL@CLEF tasks involving three different languages: English, French and German. Language modeling is adopted as the underlying information retrieval model. While the data collection is extremely sparse, smoothing is particular important when estimating a language model. The main purpose of the mo...
متن کاملYJST at the NTCIR-12 MobileClick-2 Task
Yahoo Japan Search Technology(YJST) team participated in the Japanese iUnit Ranking and Summarization subtasks of NTCIR-12 MobileClick-2. For the iUnit Ranking subtask, we adopted LM-based approach, which is implemented on the basis of organizers’ baseline system. We examined language model based iUnit ranking using both KL-divergence and negative cross entropy with several model smoothing meth...
متن کاملThe Smoothed Dirichlet Distribution: Understanding Cross-entropy Ranking in Information Retrieval
THE SMOOTHED DIRICHLET DISTRIBUTION: UNDERSTANDING CROSS-ENTROPY RANKING IN INFORMATION RETRIEVAL SEPTEMBER 2006 RAMESH M. NALLAPATI B.Tech., INDIAN INSTITUTE OF TECHNOLOGY, BOMBAY M.S., UNIVERSITY OF MASSACHUSETTS AMHERST M.S., UNIVERSITY OF MASSACHUSETTS AMHERST Ph.D., UNIVERSITY OF MASSACHUSETTS AMHERST Directed by: Prof. James Allan Unigram Language modeling is a successful probabilistic fr...
متن کاملThe Effectiveness of Results Re-Ranking and Query Expansion in Cross-language Information Retrieval
This paper presents the technique details and experimental results of the information retrieval system with which we participated at the NTCIR-8 ACLIA (Advanced Cross-language Information Access) IR4QA (Information Retrieval for Question Answering) task. Document corpus in Simplified Chinese (CS) and Traditional Chinese (CT) with topics in English, CS and CT were used in our experiments. We com...
متن کاملDedicated Backing-Off Distributions for Language Model Based Passage
Passage retrieval is an essential part of question answering systems. In this paper we use statistical language models to perform this task. Previous work has shown that language modeling techniques provide better results for both, document and passage retrieval. The motivation behind this paper is to define new smoothing methods for passage retrieval in question answering systems. The final ob...
متن کامل